Search CORE

20 research outputs found

A GPU-Computing Approach to Solar Stokes Profile Inversion

Author: Brian J. Harker
Darwin
Davis
Eydenberg
Gray
Harker
Hestroffer
Karr
Kenneth J. Mighell
Landi Degl'Innocenti
Levenberg
NVIDIA Corp.
NVIDIA Corp.
Pierce
Press
Rachkovsky
Rees
Rees
Sanchez Almeida
Socas-Navarro
Unno
Publication venue: 'IOP Publishing'
Publication date: 01/08/2012
Field of study

We present a new computational approach to the inversion of solar photospheric Stokes polarization profiles, under the Milne-Eddington model, for vector magnetography. Our code, named GENESIS (GENEtic Stokes Inversion Strategy), employs multi-threaded parallel-processing techniques to harness the computing power of graphics processing units GPUs, along with algorithms designed to exploit the inherent parallelism of the Stokes inversion problem. Using a genetic algorithm (GA) engineered specifically for use with a GPU, we produce full-disc maps of the photospheric vector magnetic field from polarized spectral line observations recorded by the Synoptic Optical Long-term Investigations of the Sun (SOLIS) Vector Spectromagnetograph (VSM) instrument. We show the advantages of pairing a population-parallel genetic algorithm with data-parallel GPU-computing techniques, and present an overview of the Stokes inversion problem, including a description of our adaptation to the GPU-computing paradigm. Full-disc vector magnetograms derived by this method are shown, using SOLIS/VSM data observed on 2008 March 28 at 15:45 UT

arXiv.org e-Print Archive

Crossref

Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processors

Author: AMD
Anderson
Badia
Buttari
Catalán
Chitlur
Chronaki
Demmel
Dennard
Dongarra
Dongarra
Golub
Goto
Hill
IBM
Intel Corp
Kumar
Lawson
Moore
Morad
NVIDIA
Quintana-Ortí
Servat
Smith
Strazdins
Van Zee
Whaley
Winter
Zee
Publication venue: 'Elsevier BV'
Publication date: 01/03/2018
Field of study

[EN] Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the BLAS (and LAPACK) functionality for many current multi-threaded architectures, the adaption of these libraries for asymmetric multicore processors (AMPs) is still pending. In this paper we address this challenge by developing an asymmetry-aware implementation of the BLAS, based on the BLIS framework, and tailored for AMPs equipped with two types of cores: fast/power-hungry versus slow/energy-efficient. For this purpose, we integrate coarse-grain and fine-grain parallelization strategies into the library routines which, respectively, dynamically distribute the workload between the two core types and statically repartition this work among the cores of the same type. Our results on an ARM (R) big.LITTLE (TM) processor embedded in the Exynos 5422 SoC, using the asymmetry-aware version of the BLAS and a plain migration of the legacy version of LAPACK, experimentally assess the benefits, limitations, and potential of this approach from the perspectives of both throughput and energy efficiency. (C) 2016 Elsevier B.V. All rights reserved.The researchers from Universidad Jaume I were supported by projects CICYT TIN2011-23283 and TIN2014-53495-R of MINECO and FEDER, and the FPU program of MECD. The researcher from Universidad Complutense de Madrid was supported by project CICYT TIN2015-65277-R. The researcher from Universitat Politecnica de Catalunya was supported by projects TIN2015-65316-P from the Spanish Ministry of Education and 2014 SGR 1051 from the Generalitat de Catalunya, Dep. dinnovacio, Universitats i Empresa.Catalán, S.; Herrero, JR.; Igual Peña, FD.; Rodríguez-Sánchez, R.; Quintana Ortí, ES.; Adeniyi-Jones, C. (2018). Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processors. Journal of Computational Science. 25:140-151. https://doi.org/10.1016/j.jocs.2016.10.020S1401512

Crossref

Repositori Institucional de la Universitat Jaume I

RiuNet

Fine-grained bit-flip protection for relaxation methods

Author: Aliaga
Anzt
Anzt
Anzt
Bridges
Bronevetsky
Calhoun
Chalios
Chazan
Chen
Chow
Duff
Duranton
Elliott
Elliott
Frommer
Hennessy
HP Corp
Innovative Computing Lab
Kanter
Karpuzcu
Kogge
Lucas
Moore
NVIDIA Corporation
Saad
Sao
Trottenberg
Venkataramani
Publication venue: 'Elsevier BV'
Publication date: 01/09/2019
Field of study

[EN] Resilience is considered a challenging under-addressed issue that the high performance computing community (HPC) will have to face in order to produce reliable Exascale systems by the beginning of the next decade. As part of a push toward a resilient HPC ecosystem, in this paper we propose an error-resilient iterative solver for sparse linear systems based on stationary component-wise relaxation methods. Starting from a plain implementation of the Jacobi iteration, our approach introduces a low-cost component-wise technique that detects bit-flips, rejecting some component updates, and turning the initial synchronized solver into an asynchronous iteration. Our experimental study with sparse incomplete factorizations from a collection of real-world applications, and a practical GPU implementation, exposes the convergence delay incurred by the fault-tolerant implementation and its practical performance.This material is based upon work supported in part by the U.S. Department of Energy (Award Number DE-SC-0010042) and NVIDIA. E. S. Quintana-Orti was supported by project CICYT TIN2014-53495-R of MINECO and FEDER.Anzt, H.; Dongarra, J.; Quintana Ortí, ES. (2019). Fine-grained bit-flip protection for relaxation methods. Journal of Computational Science. 36:1-11. https://doi.org/10.1016/j.jocs.2016.11.013S11136Chow, E., & Patel, A. (2015). Fine-Grained Parallel Incomplete LU Factorization. SIAM Journal on Scientific Computing, 37(2), C169-C193. doi:10.1137/140968896Karpuzcu, U. R., Kim, N. S., & Torrellas, J. (2013). Coping with Parametric Variation at Near-Threshold Voltages. IEEE Micro, 33(4), 6-14. doi:10.1109/mm.2013.71Bronevetsky, G., & de Supinski, B. (2008). Soft error vulnerability of iterative linear algebra methods. Proceedings of the 22nd annual international conference on Supercomputing - ICS ’08. doi:10.1145/1375527.1375552Sao, P., & Vuduc, R. (2013). Self-stabilizing iterative solvers. Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - ScalA ’13. doi:10.1145/2530268.2530272Calhoun, J., Snir, M., Olson, L., & Garzaran, M. (2015). Understanding the Propagation of Error Due to a Silent Data Corruption in a Sparse Matrix Vector Multiply. 2015 IEEE International Conference on Cluster Computing. doi:10.1109/cluster.2015.101Chazan, D., & Miranker, W. (1969). Chaotic relaxation. Linear Algebra and its Applications, 2(2), 199-222. doi:10.1016/0024-3795(69)90028-7Frommer, A., & Szyld, D. B. (2000). On asynchronous iterations. Journal of Computational and Applied Mathematics, 123(1-2), 201-216. doi:10.1016/s0377-0427(00)00409-xDuff, I. S., & Meurant, G. A. (1989). The effect of ordering on preconditioned conjugate gradients. BIT, 29(4), 635-657. doi:10.1007/bf01932738Aliaga, J. I., Barreda, M., Dolz, M. F., Martín, A. F., Mayo, R., & Quintana-Ortí, E. S. (2014). Assessing the impact of the CPU power-saving modes on the task-parallel solution of sparse linear systems. Cluster Computing, 17(4), 1335-1348. doi:10.1007/s10586-014-0402-

Crossref

Repositori Institucional de la Universitat Jaume I

RiuNet

Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems

Author: ACC.
AMD.
AMD.
Amini Mehdi
Dominik Grewe
Grewe Dominik
Kayiran Onur
LLVM.
Michael F. P. O’boyle
NVIDIA Corp.
Ogilvie William
PathScale Inc.
Portland Group
Project Omini Compiler
Sung Jui
University of Illinois at Urbana-Champaign (UIUC).
Wang Zheng
Zheng Wang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach to automatically generate optimized OpenCL code from data parallel OpenMP programs for GPUs. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses automatic machine learning to build a predictive model to determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multicore host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on distinct GPU-based systems. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) on Core i7/NVIDIA GeForce GTX580 and Core i7/AMD Radeon 7970 platforms, respectively, over a sequential baseline. Our approach achieves, on average, greater than 10× speedups over two state-of-the-art automatic GPU code generators

Crossref

Lancaster E-Prints

White Rose Research Online

GPGPU-based explicit finite element computations for applications in biomechanics: the performance of material models, element technologies, and hardware generations

Author: Belytchko T
Belytschko T
D. M. Pierce
Dongarra J
Grytz R
Göddeke D
Hallquist JO
Hughes TJR
J. Vander Sloten
N. Famaey
Nvidia Corp
Nvidia Corp
Nvidia Corp
Nvidia Corp
Nvidia Corp
Nvidia Corp
Nvidia Corp
V. Strbac
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

International audienceUsing power meters and performance counters to get insight on system's behavior in terms of power consumption is common nowadays. The values coming from these external or internal meters are usually used directly by the research community, for instance to derive higher-level power models with learning techniques or to use them in decision tools such as schedulers in HPC and Cloud Computing. While it is reasonable when one wants only to have a broad view on the power consumption, they can not be used directly in most cases: We prove in this article that the problems of distributed measure and hardware limits are way more complex and create bias, and we give the keys to understand and chose the proper methodology to handle these bias to obtain relevant values for enhanced usage. A generic methodology is analyzed and its main lessons extracted for a direct usage by the research community to master system and power measures for servers in datacenter

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte